Discovering statistically non-redundant subgroups
نویسندگان
چکیده
The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify the statistical difference of two groups for a certain outcome and it is a very suitable measure for quantifying the quality of subgroups. In this paper, we propose a statistically sound framework for statistically non-redundant subgroup discovery: measuring the quality of subgroups by the odds ratio and defining statistically non-redundant subgroups by the error bounds of odds ratios. We show that our proposed method is faster than most existing methods and discovers complete statistically non-redundant subgroups.
منابع مشابه
Local Density Sven
We survey computational approaches and solutions for discovering locally dense groups in social and complex networks. A graph-theoretical property is local if it is definable over subgraphs induced by the groups only. In this context, we discuss perfectly dense groups (cliques), structurally dense groups (plexes, cores), and statistically dense groups (η-dense groups). We give algorithms (and h...
متن کاملDiscovering Non-Redundant Association Rules using MinMax Approximation Rules
Dept. Of Comp. Sci. & Eng. Vaagdevi college of Eng. Warangal, India [email protected] Abstract Frequent pattern mining is an important area of data mining used to generate the Association Rules. The extracted Frequent Patterns quality is a big concern, as it generates huge sets of rules and many of them are redundant. Mining Non-Redundant Frequent patterns is a big concern in the area of Ass...
متن کاملHandling large databases in data mining
M. Mehdi Owrang O. American University, Dept of Computer Science & IS, Washington DC 20016 [email protected] ABSTRACT Current database technology involves processing a large volume of data in order to discover new knowledge. The high volume of data makes discovery process computationally expensive. In addition, real-world databases tend to be incomplete, redundant, and inconsistent that could...
متن کاملHandling Large Databases in Data Mining
M. Mehdi Owrang O. American University, Dept of Computer Science & IS, Washington DC 20016 [email protected] ABSTRACT Current database technology involves processing a large volume of data in order to discover new knowledge. The high volume of data makes discovery process computationally expensive. In addition, real-world databases tend to be incomplete, redundant, and inconsistent that could...
متن کاملASIC Design of Butterfly Unit Based on Non-Redundant and Redundant Algorithm
Fast Fourier Transform (FFT) processors employed with pipeline architecture consist of series of Processing Elements (PE) or Butterfly Units (BU). BU or PE of FFT performs multiplication and addition on complex numbers. This paper proposes a single BU to compute radix-2, 8 point FFT in the time domain as well as frequency domain by replacing a series of PEs. This BU comprises of fused floating ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 67 شماره
صفحات -
تاریخ انتشار 2014